117 research outputs found

    Global disease monitoring and forecasting with Wikipedia

    Full text link
    Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r2r^2 up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarit

    Epidemiological data challenges: planning for a more robust future through data standards

    Get PDF
    Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.Comment: v2 includes several typo fixes; v3 adds a paragraph on backfill; v4 adds 2 new paragraphs to the conclusion that address Frontiers reviewer comments; v5 adds some minor modifications that address additional reviewer comment

    Forecasting the 2013--2014 Influenza Season using Wikipedia

    Full text link
    Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects between 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013--2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method projected the actual outcome with a high probability. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has past.Comment: Second version. In previous version 2 figure references were compiling wrong due to error in latex sourc

    Development of 11-Plex MOL-PCR Assay for the Rapid Screening of Samples for Shiga Toxin-Producing Escherichia coli

    Get PDF
    Strains of Shiga toxin-producing Escherichia coli (STEC) are a serious threat to the health, with approximately half of the STEC related food-borne illnesses attributable to contaminated beef. We developed an assay that was able to screen samples for several important STEC associated serogroups (O26, O45, O103, O104, O111, O121, O145, O157) and three major virulence factors (eae, stx1, stx2) in a rapid and multiplexed format using the Multiplex oligonucleotide ligation-PCR (MOL-PCR) assay chemistry. This assay detected unique STEC DNA signatures and is meant to be used on samples from various sources related to beef production, providing a multiplex and high-throughput complement to the multiplex PCR assays currently in use. Multiplex oligonucleotide ligation-PCR (MOL-PCR) is a nucleic acid-based assay chemistry that relies on flow cytometry/image cytometry and multiplex microsphere arrays for the detection of nucleic acid-based signatures present in target agents. The STEC MOL-PCR assay provided greater than 90% analytical specificity across all sequence markers designed when tested against panels of DNA samples that represent different STEC serogroups and toxin gene profiles. This paper describes the development of the 11-plex assay and the results of its validation. This highly multiplexed, but more importantly dynamic and adaptable screening assay allows inclusion of additional signatures as they are identified in relation to public health. As the impact of STEC associated illness on public health is explored additional information on classification will be needed on single samples; thus, this assay can serve as the backbone for a complex screening system

    Development of 11-Plex MOL-PCR Assay for the Rapid Screening of Samples for Shiga Toxin-Producing Escherichia coil

    Get PDF
    Citation: Woods, T. A., Mendez, H. M., Ortega, S., Shi, X. R., Marx, D., Bai, J. F., . . . Deshpande, A. (2016). Development of 11-Plex MOL-PCR Assay for the Rapid Screening of Samples for Shiga Toxin-Producing Escherichia coil. Frontiers in Cellular and Infection Microbiology, 6, 12. doi:10.3389/fcimb.2016.00092Strains of Shiga toxin-producing Escherichia coli (STEC) are a serious threat to the health, with approximately half of the STEC related food-borne illnesses attributable to contaminated beef. We developed an assay that was able to screen samples for several important STEC associated serogroups (O26, O45, O103, O104, O111, O121, O145, O157) and three major virulence factors (eae, stx(1), stx(2)) in a rapid and multiplexed format using the Multiplex oligonucleotide ligation-PCR (MOL-PCR) assay chemistry. This assay detected unique STEC DNA signatures and is meant to be used on samples from various sources related to beef production, providing a multiplex and high-throughput complement to the multiplex PCR assays currently in use. Multiplex oligonucleotide ligation-PCR (MOL-PCR) is a nucleic acid-based assay chemistry that relies on flow cytometry/image cytometry and multiplex microsphere arrays for the detection of nucleic acid-based signatures present in target agents. The STEC MOL-PCR assay provided greater than 90% analytical specificity across all sequence markers designed when tested against panels of DNA samples that represent different STEC serogroups and toxin gene profiles. This paper describes the development of the 11-plex assay and the results of its validation. This highly multiplexed, but more importantly dynamic and adaptable screening assay allows inclusion of additional signatures as they are identified in relation to public health. As the impact of STEC associated illness on public health is explored additional information on classification will be needed on single samples; thus, this assay can serve as the backbone for a complex screening system

    The Biosurveillance Analytics Resource Directory (BARD): Facilitating the Use of Epidemiological Models for Infectious Disease Surveillance

    Get PDF
    Epidemiological modeling for infectious disease is important for disease management and its routine implementation needs to be facilitated through better description of models in an operational context. A standardized model characterization process that allows selection or making manual comparisons of available models and their results is currently lacking. A key need is a universal framework to facilitate model description and understanding of its features. Los Alamos National Laboratory (LANL) has developed a comprehensive framework that can be used to characterize an infectious disease model in an operational context. The framework was developed through a consensus among a panel of subject matter experts. In this paper, we describe the framework, its application to model characterization, and the development of the Biosurveillance Analytics Resource Directory (BARD; http://brd.bsvgateway.org/brd/), to facilitate the rapid selection of operational models for specific infectious/communicable diseases. We offer this framework and associated database to stakeholders of the infectious disease modeling field as a tool for standardizing model description and facilitating the use of epidemiological models

    Results from the centers for disease control and prevention's predict the 2013-2014 Influenza Season Challenge

    Get PDF
    Background: Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013-14 Unites States influenza season. Methods: Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013-March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Results: Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1 %, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Conclusion: Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts. © 2016 The Author(s)

    Recommended reporting items for epidemic forecasting and prediction research : the EPIFORGE 2020 guidelines

    Get PDF
    Funding: MIDAS Coordination Center and the National Institutes of General Medical Sciences (NIGMS 1U24GM132013) for supporting travel to the face-to-face consensus meeting by members of the Working Group. NGR was supported by the National Institutes of General Medical Sciences (R35GM119582). Travel for SV was supported by the National Institutes of General Medical Sciences (1U24GM132013-01). BMA was supported by Bill & Melinda Gates through the Global Good Fund. RL was funded by a Royal Society Dorothy Hodgkin Fellowship.Background  The importance of infectious disease epidemic forecasting and prediction research is underscored by decades of communicable disease outbreaks, including COVID-19. Unlike other fields of medical research, such as clinical trials and systematic reviews, no reporting guidelines exist for reporting epidemic forecasting and prediction research despite their utility. We therefore developed the EPIFORGE checklist, a guideline for standardized reporting of epidemic forecasting research. Methods and findings  We developed this checklist using a best-practice process for development of reporting guidelines, involving a Delphi process and broad consultation with an international panel of infectious disease modelers and model end users. The objectives of these guidelines are to improve the consistency, reproducibility, comparability, and quality of epidemic forecasting reporting. The guidelines are not designed to advise scientists on how to perform epidemic forecasting and prediction research, but rather to serve as a standard for reporting critical methodological details of such studies. Conclusions  These guidelines have been submitted to the EQUATOR network, in addition to hosting by other dedicated webpages to facilitate feedback and journal endorsement.Publisher PDFNon peer reviewe

    The Eleventh and Twelfth Data Releases of the Sloan Digital Sky Survey: Final Data from SDSS-III

    Get PDF
    The third generation of the Sloan Digital Sky Survey (SDSS-III) took data from 2008 to 2014 using the original SDSS wide-field imager, the original and an upgraded multi-object fiber-fed optical spectrograph, a new near-infrared high-resolution spectrograph, and a novel optical interferometer. All of the data from SDSS-III are now made public. In particular, this paper describes Data Release 11 (DR11) including all data acquired through 2013 July, and Data Release 12 (DR12) adding data acquired through 2014 July (including all data included in previous data releases), marking the end of SDSS-III observing. Relative to our previous public release (DR10), DR12 adds one million new spectra of galaxies and quasars from the Baryon Oscillation Spectroscopic Survey (BOSS) over an additional 3000 deg2 of sky, more than triples the number of H-band spectra of stars as part of the Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE), and includes repeated accurate radial velocity measurements of 5500 stars from the Multi-object APO Radial Velocity Exoplanet Large-area Survey (MARVELS). The APOGEE outputs now include the measured abundances of 15 different elements for each star. In total, SDSS-III added 5200 deg2 of ugriz imaging; 155,520 spectra of 138,099 stars as part of the Sloan Exploration of Galactic Understanding and Evolution 2 (SEGUE-2) survey; 2,497,484 BOSS spectra of 1,372,737 galaxies, 294,512 quasars, and 247,216 stars over 9376 deg2; 618,080 APOGEE spectra of 156,593 stars; and 197,040 MARVELS spectra of 5513 stars. Since its first light in 1998, SDSS has imaged over 1/3 of the Celestial sphere in five bands and obtained over five million astronomical spectra. \ua9 2015. The American Astronomical Society

    Mapping local patterns of childhood overweight and wasting in low- and middle-income countries between 2000 and 2017

    Get PDF
    A double burden of malnutrition occurs when individuals, household members or communities experience both undernutrition and overweight. Here, we show geospatial estimates of overweight and wasting prevalence among children under 5 years of age in 105 low- and middle-income countries (LMICs) from 2000 to 2017 and aggregate these to policy-relevant administrative units. Wasting decreased overall across LMICs between 2000 and 2017, from 8.4% (62.3 (55.1–70.8) million) to 6.4% (58.3 (47.6–70.7) million), but is predicted to remain above the World Health Organization’s Global Nutrition Target of <5% in over half of LMICs by 2025. Prevalence of overweight increased from 5.2% (30 (22.8–38.5) million) in 2000 to 6.0% (55.5 (44.8–67.9) million) children aged under 5 years in 2017. Areas most affected by double burden of malnutrition were located in Indonesia, Thailand, southeastern China, Botswana, Cameroon and central Nigeria. Our estimates provide a new perspective to researchers, policy makers and public health agencies in their efforts to address this global childhood syndemic
    corecore